8 research outputs found

    Variational Inference for Stochastic Block Models from Sampled Data

    Full text link
    This paper deals with non-observed dyads during the sampling of a network and consecutive issues in the inference of the Stochastic Block Model (SBM). We review sampling designs and recover Missing At Random (MAR) and Not Missing At Random (NMAR) conditions for the SBM. We introduce variants of the variational EM algorithm for inferring the SBM under various sampling designs (MAR and NMAR) all available as an R package. Model selection criteria based on Integrated Classification Likelihood are derived for selecting both the number of blocks and the sampling design. We investigate the accuracy and the range of applicability of these algorithms with simulations. We explore two real-world networks from ethnology (seed circulation network) and biology (protein-protein interaction network), where the interpretations considerably depends on the sampling designs considered

    Impact de l’échantillonnage sur l’inférence de structures dans les réseaux : application aux réseaux d’échanges de graines et à l’écologie

    No full text
    In this thesis we are interested in studying the stochastic block model (SBM) in the presence of missing data. We propose a classification of missing data into two categories Missing At Random and Not Missing At Random for latent variable models according to the model described by D. Rubin. In addition, we have focused on describing several network sampling strategies and their distributions. The inference of SBMs with missing data is made through an adaptation of the EM algorithm : the EM with variational approximation. The identifiability of several of the SBM models with missing data has been demonstrated as well as the consistency and asymptotic normality of the maximum likelihood estimators and variational approximation estimators in the case where each dyad (pair of nodes) is sampled independently and with equal probability. We also looked at SBMs with covariates, their inference in the presence of missing data and how to proceed when covariates are not available to conduct the inference. Finally, all our methods were implemented in an R package available on the CRAN. A complete documentation on the use of this package has been written in addition.Dans cette thèse nous nous intéressons à l’étude du modèle à bloc stochastique (SBM) en présence de données manquantes. Nous proposons une classification des données manquantes en deux catégories Missing At Random et Not Missing At Random pour les modèles à variables latentes suivant le modèle décrit par D. Rubin. De plus, nous nous sommes attachés à décrire plusieurs stratégies d’échantillonnages de réseau et leurs lois. L’inférence des modèles de SBM avec données manquantes est faite par l’intermédiaire d’une adaptation de l’algorithme EM : l’EM avec approximation variationnelle. L’identifiabilité de plusieurs des SBM avec données manquantes a pu être démontrée ainsi que la consistance et la normalité asymptotique des estimateurs du maximum de vraisemblance et des estimateurs avec approximation variationnelle dans le cas où chaque dyade (paire de nœuds) est échantillonnée indépendamment et avec même probabilité. Nous nous sommes aussi intéressés aux modèles de SBM avec covariables, à leurs inférence en présence de données manquantes et comment procéder quand les covariables ne sont pas disponibles pour conduire l’inférence. Finalement, toutes nos méthodes ont été implémenté dans un package R disponible sur le CRAN. Une documentation complète sur l’utilisation de ce package a été écrite en complément

    Impact of sampling on structure inference in networks : application to seed exchange networks and to ecology

    No full text
    Dans cette thèse nous nous intéressons à l’étude du modèle à bloc stochastique (SBM) en présence de données manquantes. Nous proposons une classification des données manquantes en deux catégories Missing At Random et Not Missing At Random pour les modèles à variables latentes suivant le modèle décrit par D. Rubin. De plus, nous nous sommes attachés à décrire plusieurs stratégies d’échantillonnages de réseau et leurs lois. L’inférence des modèles de SBM avec données manquantes est faite par l’intermédiaire d’une adaptation de l’algorithme EM : l’EM avec approximation variationnelle. L’identifiabilité de plusieurs des SBM avec données manquantes a pu être démontrée ainsi que la consistance et la normalité asymptotique des estimateurs du maximum de vraisemblance et des estimateurs avec approximation variationnelle dans le cas où chaque dyade (paire de nœuds) est échantillonnée indépendamment et avec même probabilité. Nous nous sommes aussi intéressés aux modèles de SBM avec covariables, à leurs inférence en présence de données manquantes et comment procéder quand les covariables ne sont pas disponibles pour conduire l’inférence. Finalement, toutes nos méthodes ont été implémenté dans un package R disponible sur le CRAN. Une documentation complète sur l’utilisation de ce package a été écrite en complément.In this thesis we are interested in studying the stochastic block model (SBM) in the presence of missing data. We propose a classification of missing data into two categories Missing At Random and Not Missing At Random for latent variable models according to the model described by D. Rubin. In addition, we have focused on describing several network sampling strategies and their distributions. The inference of SBMs with missing data is made through an adaptation of the EM algorithm : the EM with variational approximation. The identifiability of several of the SBM models with missing data has been demonstrated as well as the consistency and asymptotic normality of the maximum likelihood estimators and variational approximation estimators in the case where each dyad (pair of nodes) is sampled independently and with equal probability. We also looked at SBMs with covariates, their inference in the presence of missing data and how to proceed when covariates are not available to conduct the inference. Finally, all our methods were implemented in an R package available on the CRAN. A complete documentation on the use of this package has been written in addition

    missSBM: An R Package for Handling Missing Values in the Stochastic Block Model

    Get PDF
    32 pagesInternational audienceThe Stochastic Block Model (SBM) is a popular probabilistic model for random graphs. It is commonly used for clustering network data by aggregating nodes that share similar connectivity patterns into blocks. When fitting an SBM to a network which is partially observed, it is important to take into account the underlying process that generates the missing values, otherwise the inference may be biased. This paper introduces missSBM, an R-package fitting the SBM when the network is partially observed, i.e., the adjacency matrix contains not only 1's or 0's encoding presence or absence of edges but also NA's encoding missing information between pairs of nodes. This package implements a set of algorithms for fitting the binary SBM, possibly in the presence of external covariates, by performing variational inference adapted to several observation processes. Our implementation automatically explores different block numbers to select the most relevant model according to the Integrated Classification Likelihood (ICL) criterion. The ICL criterion can also help determine which observation process better corresponds to a given dataset. Finally, missSBM can be used to perform imputation of missing entries in the adjacency matrix. We illustrate the package on a network data set consisting of interactions between political blogs sampled during the French presidential election in 2007

    Variational Inference for Stochastic Block Models from Sampled Data

    No full text
    This paper deals with non-observed dyads during the sampling of a network and consecutive issues in the Stochastic Block Model (SBM) inference. We review sampling designs and recover Missing At Random (MAR) and Not Missing At Random (NMAR) conditions for SBM. We introduce several variants of the variational EM (VEM) algorithm for inferring the SBM under various sampling designs (MAR and NMAR). The sampling design must be taken into account only in the NMAR case. Model selection criteria based on Integrated Classification Likelihood (ICL) are derived for selecting both the number of blocks and the sampling design. We investigate the accuracy and the range of applicability of these algorithms with simulations. We finally explore two real-world networks from ethnology (seed circulation network) and biology (protein-protein interaction network), where the interpretations considerably depends on the sampling designs considered

    Variational Inference for Stochastic Block Models from Sampled Data

    No full text
    This paper deals with non-observed dyads during the sampling of a network and consecutive issues in the Stochastic Block Model (SBM) inference. We review sampling designs and recover Missing At Random (MAR) and Not Missing At Random (NMAR) conditions for SBM. We introduce several variants of the variational EM (VEM) algorithm for inferring the SBM under various sampling designs (MAR and NMAR). The sampling design must be taken into account only in the NMAR case. Model selection criteria based on Integrated Classification Likelihood (ICL) are derived for selecting both the number of blocks and the sampling design. We investigate the accuracy and the range of applicability of these algorithms with simulations. We finally explore two real-world networks from ethnology (seed circulation network) and biology (protein-protein interaction network), where the interpretations considerably depends on the sampling designs considered
    corecore